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(57) Abstract: Methods and apparatus for implementing an enhanced digital signal processor though the addition of modular com- 
putation units which can be operated in parallel are described. In various embodiments the computation units are implemented as 
configurable computation cells which are arranged to form a computation engine which supplements conventional DSP circuitry. 
The computation cells can be used to perform frequently used DSP functions such a cross-correlation, sorting, FIR filtering quickly 
without the need for extensive iterative processing. By using the computation cells of the present invention in parallel, the com- 
putation of common DSP functions can be performed quickly and resulting in improvements in DSP performance as compared to 
convention DSPs. 
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METHODS AND APPARATUS FOR ENHANCING DIGITAL SIGNAL 
PROCESSORS 

Field of the Invention 

5 

The present invention relates to methods and 
apparatus for performing digital signal processing 
operations and, more specifically, to methods and 
apparatus for enhancing digital signal processors. 

10 

Background 

As technology for digital electronics has 
advanced, digital signal processing using digital 

15 computers and/or customized digital signal processing 

circuits has become ever more important. Applications for 
digital signal processing include audio, video, speech 
processing, communications, system control, and many 
others. One particularly interesting application for 

20 digital signal processing is the communication of audio 
signals over the Internet. 

The transmission of audio signals over the 
Internet offers the opportunity to communicate voice 
signals, in digital form, anywhere in the world at 
25 relatively little cost. As a result, there has been an 
ever growing- interest in voice transmission over the 
Internet. In fact, Internet telephony is a fast growing 
business area due to is promise of reducing and/or 
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eliminating much of the cost associated with telephone 
.calls. In order to support Internet telephony and/or 
other applications which, may be required to process 
digital audio and/or video signals, DSPs are frequently 
5 used. 

Thus, DSPs used to process audio signals are 
found in digital telephones, audio add- in cards for 
personal computers, and in a wide variety of other 
devices. In addition to processing of audio signals, a 
10 single DSP may be called upon to processes a wide range 
of digital data including video data and numeric data. 

Digital audio and/or video files or data 
streams representing sampled audio and/or video images 
can be rather large. In the interests of reducing the 

15 amount of memory required to store such files and/or the 
amount of bandwidth required to transmit such files, data 
compression is frequently used. In order to determine if 
a specific set of data, e.g., a subset of the data being 
subject to compression, will benefit from compression 

20 processing, a correlation operation is often performed. 

Data compression is then performed on subsets of the data 
being processed as a function of the output of the 
correlation operation. Accordingly, correlation 
operations are frequently performed when processing audio 

25 data, video data and other types of data. 

As will be discussed in detail below, cross 
correlation generally involves processing two sequences 
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of numbers, each sequence including e.g., N elements, to- 
produce an output sequence which also has N elements, 
where N may be any positive integer. Each element of the 
input and output sequences is normally a number 
5 represented by one or more bits. Cross correlation 

processing generally requires N multiplications and N-l 
additions to produce each of the N output elements. 
Thus, a total of N 2 multiples and (N 2 -N) additions must 
normally be performed to produce an N element cross 
10 correlation output. 

From a cost standpoint, it is desirable to 
avoid building into a DSP a large amount of customized 
circuitry which is likely to be used only infrequently or 
is likely to go unused altogether. In typical DSP 

15 applications, software is normally used to configure 

adders, subtracters, multipliers and registers to perform 
various functions. In some cases, additional specialized 
circuitry may be included in the DSP. For example, some 
DSPs include a relatively small number, e.g., two, 

20 Multiply-and-Accumulate {MAC) processing units. The MAC 
processing units can be used to multiply 2 numbers and 
add the result into a storage register sometimes called 
an accumulator. MAC units may be reused under software 
control . 

25 Since the number of MAC units in typical DSPs 

is relatively limited, computationally intensive 
calculations such as, e.g., cross-correlation, normally 



WO 02/12978 



PCT/US01/24667 



-4- 

have to rely on software loops and/or multiple processing 
iterations to be completed. 

In addition to cross-correlation, other 
frequently used DSP functions include sorting, finite 
5- impulse response filtering, convolution, vector sum, 
vector product, and min/max selection. In many 
applications, such functions generally involve arithmetic 
calculations applied to long sequences of numbers 
representing discrete signals. 

10 In many applications, the amount of time 

available to process a set of data is limited to real 
world constraints, such as the rate at which digital data 
representing an audio signal will be use to generate 
audio signals that are presented to a listener. Real 

15 time processing is often used to refer to processing that 
needs to be performed at or near the rate at which data 
is generated or used in real world applications. In the 
case of audio communications systems, such as telephones, 
failure to process audio in or near real time can result 

20 in noticeable delays, noise, and/or signal loss. 

While the use of iterative loops to perform 
signal processing operations serves to limit the need for 
specialized circuitry in a DSP, it also means that DSPs 
often need to support clock speeds which are much higher 
25 than would be required if more computationally complex 
operations could be performed without the need for 
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iterative processing operations or with fewer iterative 
processing operations . 

In view of the above discussion, it is apparent 
that there is a need for methods and apparatus which can 
5 be used to reduce the need for iterative processing 
operations in DSPs. It is desirable from an 
implementation standpoint, that any new circuitry be 
modular in design. It is also desirable that circuitry 
used to implement at least some new methods and apparatus 
10 be capable of being used to support one or more common 
DSP processing operations. In addition, from a hardware 
efficiency standpoint, it would be beneficial if at least 
some circuits were easily configurable so that they could 
be used to support multiple DSP processing operations. 

15 

SUMMARY OF THE INVENTION 

The present invention is directed to methods 
and apparatus for' improving the way in which digital 
2 0 signal processors perform a wide variety of common 

operations including cross -correlation, sorting, finite 
impulse response filtering, in addition to other 
operations which use multiply, add, subtract, compare 
and/or store functionality. 

25 In accordance with various embodiments of the 

present invention, digital signal processors and/or other 
programmable circuits are enhanced through the addition 
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of one or more computation engines. The computation 
engines of the present invention are of a modular design 
with each computation engine being constructed from a 
plurality of computation cells each of which may be of 
5 the same design. The computation cells are connected to 
form a sequence of cells capable of performing processing 
operations in parallel. 

In embodiments where the computation results 
are read out of the last computation cell in a sequence 
10 of computation cells, the values resulting from the 

processing of each computation cell can be shifted out of 
the computation engine with the results being passed from 
computation cell to computation cell so that the results 
of multiple cells can be read. 

15 The computation cells of the present invention 

may be implemented to perform a specific function such as 
cross-correlation, sorting or filtering. Thus, a 
computation engine may be dedicated to supporting a 
particular function such as cross-correlation. 

20 However, in other embodiments, the computation 

cells are designed to be configurable allowing a 
computation engine to support a wide range of 
applications. 



One or more multiplexers may be included in each 
25 computation cell to allow re -configuring of the 

computation cell and thus how signals are routed between 
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the computation cell components and which computation 
cell components are used at any given time. 

By reconfiguring the way in which the signals are 
5 supplied to the internal components of the computation 
cells and the way in which signals are passed between 
computation cell components, multiple signal processing 
operations can be performed using the same computation 
cell hardware. 

10 A control value supplied to each computation cell in 

a computation engine can be used to control the 
components of the computation cells and how each of the 
computation cells is configured. In some embodiments, 
e.g., embodiments which support sorting, the 

15 configuration of a computation cell is also controlled, 
in part, by a cascade control signal generated by a 
preceding computation cell in the sequence of computation 
cells. 

A control register may be included in the 
20 computation engine for storing the control value used to 
control the configuration of the individual computation 
cells included in the computation engine. The output of 
the control register is supplied to a control value input 
of each of a computation engine's computation cells. 
25 Thus, the configuration of the computation engine's 

computation cells can be modified by simply writing a new 
control value into the control register. 
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A control value may be several bits e.g., 12 bits, 
in length. In one embodiment, different fields of the 12 
bit control signal are dedicated to controlling different 
elements of the computation cells. For example, 
5 different bits may be dedicated to controlling different 
multiplexers, while another set of bits is dedicated to 
controlling the resetting of values stored in computation 
cell storage devices, while yet another bit is set to 
control whether an adder/ subtractor performs addition or 
10 subtraction. 

In accordance with the present invention, a 
software controllable portion of a digital signal 
processor can be used to control the configuration of a 

15 computation engine of the present invention by 

periodically storing an updated control value in the 
computation engine's control register. In addition the 
software controllable portion of the digital signal 
processor can supply data to be processed to one or more 

2 0 data inputs included in the computation engine and 

receive, e.g., read out, the results of a processing 
operation performed by the computation engine of the 
present invention. 

25 Both the software controllable digital signal 

processing circuitry and the computation engine of the 
present invention are, in various embodiments, 
implemented on the same semiconductor chip. 
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Because the present invention allows all or 
portions of many processing operations to be performed in 
parallel through the use of multiple computation 
circuits, processing efficiencies can be achieved as 
5 compared to embodiments where software loops are used in 
place of the parallel hardware circuits of the present 
invention. 



Additional features, embodiments and benefits 
10 of the methods and apparatus of the present invention 

will be discussed below in the detailed description which 
follows . 



BRIEF DESCRIPTION OP THE DRAWINGS 

15 

Fig. 1 illustrates an enhanced signal processor 
implemented in accordance with the present invention. 

Fig.' 2 illustrates a computation engine 
2 0 implemented in accordance with one exemplary embodiment 
of the present invention. 

Fig. 3 illustrates a multi-purpose computation 
engine illustrated in accordance with another embodiment 
25 of the present invention. 



Fig. 4 illustrates a cross-correlation 
computation cell of the present invention suitable for 
use in the computation engine illustrated in Fig. 2. 
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Fig. 5 illustrates a sorting cell of the 
present invention suitable for use in the computation 
engine illustrated in Fig. 2. 

5 

Fig. 6 illustrates an FIR filter cell of the 
present invention suitable for use in the computation 
engine illustrated in Fig. 2. 

10 Fig. 7 illustrates a multi-purpose configurable 

computation cell of the present invention which may be 
used in the computation engines illustrated in Figs. 2 or 
3 . 

Fig. 8 illustrates control logic which may be 
15 used as the control logic of the multi-purpose 

configurable computation cell illustrated in Fig. 7. 

DETAILED DESCRIPTION 

20 The present invention relates to methods and 

apparatus for enhancing digital signal processors. Fig. 
1 illustrates a digital signal processor 100 implemented 
in accordance with the present invention. As illustrated 
the DSP 100 includes first and second programmable 

25 processing circuits 102, 102'. The programmable 

processing circuits 102, 102' include data inputs via 
which they can receive one or more data streams 
representing, e.g., sampled signals. Each data stream 
may correspond to, e.g., one or more physical or virtual 
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voice channels. The programmable processing circuits 
102, 102' process the received signals under control of 
software 104, 104' and in conjunction with computation 
engine 103 which is coupled to the programmable 
5 processing circuits 102. 104. Data input control circuit 
106, which may be implemented using a multiplexer, 
determines which programmable processing circuitl02, 102' 
supplies data to the computation engine 103 at any given 
time. Data input control 106 is responsive to a control 
10 signal received from programmable processing circuit 102 
to determine which of the two processors 102, 102' will 
supply data to the computation engine at any given time. 

The processing performed by processing circuits 
15 102, 102" operating in conjunction with the computation 
engine 103 may include various types of voice signal 
processing, e.g., data compression/decompression, 
filtering and/or identification of maximum values such as 
amplitude values, in blocks of data being processed. The 
2 0 data compression/decompression operation may involve 
performing one or more correlation operations. The 
filtering may be, e.g., finite impulse response (FIR) 
filtering, which is performed on one or more voice 
signals being processed. The sorting operation may 
25 involve identifying the maximum and/or minimum signal 
amplitude values in a block of data representing such 
values which is being processed. 
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Programmabl e processors 102, 102' share the 
computation engine as a common resource which can be used 
on a time shared basis. The computation engine 103, in 
accordance with the present invention, receives data and 
5 configuration information from one or both of the 
programmable processors 102, 102'. 

Both the computation engine 103 and processing circuits 
102, 102' may be implemented on the same semiconductor to 
10 form a single chip implementation of the enhanced DSP 
100. 

Programmable processing circuitry 102, 102' is 
capable of performing operations under software control 

15 as done in conventional DSP circuits. However, they also 
have the ability to control the configuration of the 
computation engine 103, to send data for processing to 
the computation engine 103 and to receive the results of 
processing operations performed by the computation engine 

20 103. As will be discussed below, the computation engine 
103 includes hardware circuitry which allows parallel 
processing operations to be performed on received data 
thereby facilitating many common operations, e.g., cross- 
correlation, sorting, and FIR filtering to name but a few 

2 5 examples. 

The processing circuits 102, 102' receives 
digital data, e.g., sampled signals, to be processed. 
When a processing operation needs to be performed for 
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which the computation engine 103 can be used, the 
processing circuits 102. 102' pass the data to be 
processed to the computation engine 103 and then receive 
back from the computation engine the result of the 
5 processing operation. 

The data received back from the computation 
engine 103 may be used in further processing performed by 
the processing circuitry 102, 102'. The circuitry 102, 
10 102' outputs processed signals, e.g., digital data, 

produced by the processing it performs. In some cases, 
the output of the computation engine 103 is used directly 
as- the processed signal output of the digital signal 
processor 100. 

15 

In accordance with the present invention, the 
programmable processor 102 can configure the compuation 
engine to perform any one of a variety of supported 
operations. Thus, e.g., the programmable processor 102 

2 0 may control the computation engine to first perform a 

correlation operation, then a filtering operation which 
may then be followed by another correlation operation or 
even a sorting operation. Virtually any order of 
supported operations is possible.- Different operations 

25 may be used for processing different data streams, e.g., 
corresponding to different voice channels or paths 
whether they be physical or virtual. In a similar manner 
processing circuit 102' can control the configuration and 
thus processing of computation engine 103. Thus, the 
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first programmable processing circuit 102 may use the 
computation engine to perform one or more correlation 
operations while the second processing circuit may- 
control the computation engine to perform one or more 
5 other processing operations, e.g., FIR filtering or 
sorting. 

While Fig. 1 illustrates two processing 
circuits 102, 102" it should be understood that 
10 additional processing circuits may share the computation 
engine 103 and/or the computation engine may be pared 
with a single processing circuit 102 . 

Fig. 2 illustrates a first exemplary 
15 computation engine 200 which may be used as the 
computation engine illustrated in Fig. 1. As 
illustrated, the exemplary computation engine includes a 
plurality of M computation cells 202, 204, 206 which are 
coupled together to form a sequence or cascade of first 
2 0 through Mth computation cells. As will be discussed 

below, each of the computation cells 202, 204, 206 may be 
implemented using the same or similar circuitry. This 
has the advantage of allowing for a simple and consistent 
method of controlling each of the computation cells 202, 
25 204, 206. It can also help simplify the manufacturing of 
the computation engine 200 by avoiding the need to 
manufacture multiple unique circuits to implement each 
computation cell. 
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In the Fig. 2 embodiment, the computation cells 
are controlled in unison by bits of a global control 
value. The global control value is loaded into a global 
control register 2 08 which is coupled to a global control 
5 value input of each of the computation cells 202, 2 04, 
206. In some embodiments, configuration of individual 
cells is further controlled by a cascade control signal 
which is generated from a preceding computation cell when 
a preceding computation cell is present . 

10 

Each computation cell 2 02, 2 04, 2 06 has several 
inputs and at least one output. The inputs include a 
data input, a broadcast input, and an optional cascade 
control signal input . The output of each computational 
15 cell includes a data signal output and, optionally, a 
cascade control signal output. Each signal input and 
output may correspond to one or more signal lines. 

The data input of the computation engine 2 00 is 
coupled to the Broadcast input of each of the first 
20 through M th computation cells 202, 204, 206. In this 

manner, each computation cell is supplied with the input 
data received by the computation engine 200. The data 
input of the computation engine 200 is also coupled to 
the data input of the first computation cell 202. 

25 The data output of each computation cell is 

coupled with the data input of the next computation cell 
in the sequence of M computation cells. The data output 
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of the last (Mth) computation cell 206 is coupled to the 
data output of the computation engine 200. 

In addition to data inputs and outputs, each 
computation cell may include an optional cascade control 
5 input and cascade control output. Since these signal 
inputs and outputs may be omitted in some embodiments, 
they are shown in Fig. 2 using dashed lines. When 
present, the cascade control input of the first 
computation cell 2 02 is supplied with- a constant value, 
10 e.g., 0. The cascade control output of each of the first 
through M-l computation cells, when present, are coupled 
to the cascade control, input of the subsequent 
computation cell . The cascade control output of the Mth 
computation cell goes unused. 

15 In various embodiments, the Data Input of the 

computation engine 200 and each computation cell 2 02, 
204, 206 includes up to three distinct parallel data 
inputs. The data outputs of the computation engine and 
each computation cell normally includes the same number 

2 0 of distinct parallel data outputs as inputs. 

The data input and data output of each 
computation cell 2 02, 204, 206 may be implemented as a 
. single, double or triple data path. The three data 
signals which may be received via the data input, in the 
25 case of a triple data path implementation, are DATA1, 
DATA2 and DATA3 . Similarly DATA1, DATA2 , and DATA3 
output signals may be generated by a computation cell. 
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For each received data input signal a corresponding data 
output signal is normally generated and supplied to the 
corresponding data input of the next computation cell, in 
the sequence of cells 202, 204, 206. In the case where 
5 multiple parallel inputs are supported as part of the 
data input to each computation cell, one or more of the 
data inputs may be active at any time depending on the 
particular implementation and processing being performed. 

In a similar manner, the Broadcast input may 

10 implemented as a single or a double input. In some 
embodiments, a single Broadcast input is used and a 
single broadcast signal is supplied to each of the 
computation cells while in other embodiments two 
broadcast inputs are used allowing for up to two 

15 broadcast signals, Broadcastl and Broadcast2 , to be 

received by each computation cell. Each Broadcast signal 
corresponds to a different one of the Eata Input signals 
which may be supplied via parallel paths to the 
computation engine. Thus, via the Broadcast input, a 

20 Broadcastl and/or Broadcast2 signal can be received. The 
Broadcastl input of each computation cell 202, 204, 206, 
when present, is coupled to the DATA1 input of the 
computation engine 2 00 and therefore receives the same 
input signal as the DATA1 signal input of the first 

25 computation cell 202. The Broadcast2 signal of each 

computation cell 202, 204, 206, when present, is coupled 
to the DATA2 input of the computation engine and 
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therefore receives the same input signal as the DATA 2 
input signal supplied to the first computation cell 202. 

Figure 3 illustrates a computation engine 3 00 
where the DATA INPUT to the computation engine includes 
5 three parallel data inputs, DATA1, DATA2, DATA3 . In 

addition, the data input and output of each computation 
cell 3 02, 304, 3 06 corresponds to three parallel data 
inputs and three parallel data outputs labeled DATA1 , 
DATA2 , DATA3. The data inputs DATA1 , DATA2 , DATA3 of the 

10 computation engine 300 are coupled to the corresponding 
data inputs of the first computation cell 3 02. The data 
outputs of each of the first through M-l computation 
cells are coupled to the corresponding data inputs of the 
next computation cell in the sequence of computation 

15 cells 302, 304, 306. 

In addition to being coupled to the DATA1 input 
• of the first computation cell 302, the DATA1 input of the 
computation engine is coupled to a BROAD CAST1" input of 
each of the computation cells 302, 3 04, 306. In a 
2 0 similar manner, the DATA2 input of the computation engine 
300 is coupled to a BROADCAST2 input of each of the 
computation cells 302, 304, 306. i 

A value of zero is supplied to a cascade 
control input of the first computation cell 3 02. The 
25 cascade control output of each of the first through M-l 
computation cells is coupled to the cascade control input 
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of the. next computation cell in the sequence of 
computation cells. > 

The DATA1, DATA2 and cascade control outputs of 
the Mth computation cell 306 go unused. The DATA3 output 
5 of the Mth computation cell 3 06 is coupled to the data 
output of the computation engine 3 00. 

A global control register 3 08 is provided for 
storing a control value used to configure and/or reset 
components included in each of the M computation cells 

10 302, 304, 306. A global control value input of the 

computation engine is coupled to a corresponding input of 
the global control register 3 08. A global control value 
output of the global control register is coupled to a 
corresponding input of each one of the computation cells 

15 302, 304, 306. 

The computation cells of the present invention 
used in the computation engine 200 or 300 may be 
implemented using a relatively small number of such basic 
elements as a multiplier, an adder, subtractor, 

20 adder/subtractor, and/or a comparator as the arithmetic 
elements. The computation cell normally also includes 
some memory elements, e.g., registers, so that previous 
input signals or the partial results of a long 
computation can be stored. Multiplexers that are 

25 controlled by different fields of the control value 

stored in the global control register and/or the cascaded 
control signal can be used configure the computation 
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cell ' s elements and to direct various signals or the 
previously computed partial results of a long computation 
to the arithmetic elements for computation. 

The computation cells of a computation engine 
5 200, 300 of the present invention are controlled in 

unison by the value stored in the global control register 
and individually by cascade control signals generated 
internally and/or received from a preceding computation 
cell. Since the global control value output by global 
10 control register controls the configuration of 

computation cells, it is possible to reconfigure the 
computation cells of a computation engine by simply 
updating the global control value stored in the global 
control register 308. 

15 The cascaded control signal, generated in some 

embodiments, by each of the computation cells, is used to 
further refine the functionality within individual 
computation cells. That is, a cascade control output 
(CCO) signal generated by a computation cell, based on 

20 one or more of its input signals, may be used to control 
the next computation cell in the sequence of first 
through Mth computation cells . 

. Individual computation cells, M of which may be 
used to implement the computation engine 200 or 
25 computation engine 300, are illustrated in Figs. 4-7. 
The computation cells in Figs 4-6 are well suited for 
performing cross-correlation, sorting, and FIR filtering 
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operations, respectively. Some of the computation cells 
illustrated in Figs. 4-6 do not use all the inputs and 
outputs shown in the computation engine of Fig. 3. 
Accordingly, when a computation engine is constructed 
5 from computation cells which use fewer inputs and outputs 
than shown in Fig. 3, the signal paths, e.g., lines, and 
unused inputs /outputs may be omitted from the Fig. 3 
computation engine in the interests of implementation 
efficiency . 

10 The computation cell 400 illustrated in Fig. 4 

is well suited for performing correlation operations . M 
of the Fig. 4 computation cells may be. used to implement 
a computation engine 200, 300 suitable for performing M 
cross correlation operations in parallel . Some 

15 "applications such as, e.g., speech compression, normally 
involve a large fixed number of cross correlation 
operations to be performed on units of data being 
communicated. It is desirable that the computation 
engine 400 include enough computation cells to perform 

2.0 the multiply, add, and accumulate computations associated 
with each element of a data sequence corresponding to a 
portion of a voice signal being processed, in one or a 
small number of clock cycles. If it is not possible to 
provide enough computation units to perform the cross- 

25 correlation processing in a single clock cycle, it is 
desirable that the number of computation cells be an 
integer divisor of the number of elements in a data 
sequence upon which a cross correlation operation is 
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performed. Various exemplary numbers of computation 
cells which may be well suited in implementing a 
computation engine 200 or 300 for purposes of cross- 
correlation include 8, 10, 20, 40, 60, and 240. These 
5 numbers of computation cells are particularly useful in 
voice applications where various voice compression 
standards involve performing correlation operations on 
40, 60, or 24 0 element sequences. 

The computation cell 400 comprises a first 
10 storage element 402 labeled Storage 1, an additional 

storage element 414 labeled Storage 3, a multiplier 404, 
summer 408, a first MUX 406 labeled MUX4, and a second 
MUX 410 labeled' MUX 3. 

A first operand, Operandi, is received via a 
15 DATA1 input and is supplied to the computation cell's 
STORAGE 1 storage element 402 and to an A input of 
multiplier 404. A second operand, 0perand2, is received 
via a Broadcast 2 input of the computation cell 400 and 
supplied to a B input of multiplier 404. Multiplier 404 
2 0 operates to multiply Operandi and 0perand2 together and 

to supply the result to an II input of MUX4 406. A logic 
value of 0 is applied to an 10 input of MUX4 . MUX4 is 
controlled by the signal M4CM which will be discussed in 
detail below. MUX4, under control of the signal M4CM, 
25 operates to connect one of its inputs to its output at 
any given time. The output of MUX4 is coupled to a B 
input of summer 408. 



WO 02/12978 



PCT/US01/24667 



A DATA3 input of the computation cell 400 is 
coupled to an II input of MUX3 410. In this manner, the 
Data3 signal generated by the previous computation cell 
or, if the computation cell 400 is the first computation 
5 cell in a computation engine, an input value of zero. 

MUX3 410 receives at its data input labeled 10 the value 
output by storage element 414 which corresponds to the 
DATA3 output of the computation cell 400. 

MUX3 410 is controlled by control signal M3CM 
10 to connect one of its inputs to its output at any given 
time. The M3CM, like the M4CM control signal discussed 
elsewhere, is a two bit signal, with each bit of the 
signal being identified by a label [0] to indicate the 
lower order bit and [1] to indicate the higher order bit 
15 of the signal. 

The output of MUX3 410 serves as input A to 
summer 408. The output of summer 408 is coupled to the 
input of STORAGE 3 414. The output of STORAGE3 414 serves 
as the Data3 output of the computation cell 400. 

2 0 The contents of STRORAGE1 and STORAGE 3 may be 

reset to zero via storage control signals SIR and S3R, 
respectively. These control signals as well as control 
signals M3CM and M4CM are generated by control logic 312 
from the global control value supplied to computation 

25 cell 400. A circuit which may be used as control logic 
312 will be discussed in detail with regard to Fig. 8. 
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In the Fig. 4 embodiment, the control signals 
generated for each of the M computation cells in a 
computation engine 200 or 300 will be the same since they 
are generated from the same global control value. 
5 Accordingly, the control logic 312 may be placed in the 
computation engine 200 or 300 external to the individual 
computation cells 400. In this manner, a single control 
circuit 312 may be used to control each of the M 
computation cells 40 0 thereby eliminating the need for 
10 each of the M cells 400 to include a control logic 
circuit 312. 

Fig. 5 illustrates a sorting computation cell 
500 implemented in accordance with the present invention. 
M computation cells 500 may be used to implement a 
15 computation engine 2 00 or 30 0. 

The computation cell 500 includes a first 
multiplexer labeled MUX4 406 1 , a second multiplexer 
labeled MUX3 410', a controllable adder/ subtractor 508, a 
comparator 502, and a storage element labeled STORAGE 3 

20 414. In regard to signal inputs, the sorting computation 
cell 50 0 includes a Broadcast 1 input, a Broadcast2 input, 
a Data3 signal input, global control value input and a 
cascade control input. In regard to signal outputs, the 
sorting cell 500 includes a cascade control signal output 

25 and a Data3 signal output. 

The components of the computation cell 500 are 
coupled together as illustrated in Fig. 5. In 
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particular, the Broadcastl input is coupled to an 12 
input of MUX4 406'. Another input of MUX4, an 10 input, 
is supplied with a constant value of zero. The output of 
MUX4 406' is coupled to a B input of a controllable 
5 adder/summer 508. 

The Broadcast2 input is coupled to an A input 
of the comparator 502 and to an 12 input of MUX3 410'. 
The Data3 input is coupled to an II input of MUX3 410' . 
Another input, an 10 input of MUX3 410' is coupled to the 
10 output of storage element STORAGE3 414. The output of 
MUX3 is coupled to an A input of the ASC 508. The ASC 
508 receives as a control input an ASC control signal 
which corresponds to a pre-selected bit of the global 
control input value . 

15 The output of ASC 508 is coupled to the input 

of STORAGE 3 414. The output . of STORAGE 3 414 is coupled 
to the DATA3 output of the computation cell 500 in 
addition to a B input of comparator 502. The output of 
comparator 502 is coupled to the cascade control output 

20 of the computation cell 500. 

Operation of the sorting computation cell 500 
will be clear in view of the discussion of sorting 
performed by the multi-purpose computation cell 700 which 
may be configured to operate in generally the same manner 
25 as computation cell 500 for sorting purposes. 

Fig. 6 illustrates an FIR filter computation 
cell 60 0 which supports programmable filter weights. The 
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computation cell 600 of the present invention includes a 
multiplexer labeled, MUX1 602, a controllable adder 608, a 
multiplier 404, and first and second storage elements 
402, 414 labeled Storagel and Storage3, respectively. In 
5 regard to signal inputs, the computation cell 600 

includes a Datal signal input, a Broadcast2 signal input, 
a Data3 signal input, and a global control value input. 
In regard to signal outputs, the FIR computation cell 600 
includes DATA1 output and a DATA3 signal output. 

10 The components of the computation cell 600 are 

coupled together as illustrated in Fig. 6. In 
particular, the Datal input is coupled to an II input of 
MUX1 602. Another input of MUX1, an 10 input, is 
supplied with the value output by STORAGE 1 402. The 

15 output of MUX1 602 is coupled to an A input of the 
multiplier 404 and to the input of STORAGE 1 402. 

The Broadcast2 input is coupled to a B input of 
the multiplier 404. The output of multiplier 404 is 
coupled to a B input of controllable adder/ subtractor 
2 0 508. The DATA3 input is coupled to an A input of the 

adder 608. The output of the adder 608 is coupled to the 
input of STORAGE 3 414. The output of STORAGE 3 414 is 
coupled to the DATA3 output of the computation cell 600. 

The global control value signal input of the 
25 computation cell 600 is coupled to control logic 312' 1 
which generates from the global control value control 
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signals used to control MUX1, adder 608 and to reset the 
contents of STORAGE1 402 and STORAGE3 414 as necessary. 

In the Fig. 6 embodiment, the control signals 
generated for each of the M computation cells 600 in a 
5 computation engine 2 00 or 300 will be the same since they 
are generated from the same global control value. 
Accordingly, the control logic 312' ' may be placed in the 
computation engine 200 or 300 external to the individual 
computation cells 600. In this manner, a single control 
10 circuit 312' 1 may be used to control each of the M 

computation cells 600 thereby eliminating the need for 
each of the M cells 600 to include a control logic 
circuit 312 11 . 

Operation of the FIR filter computation cell 
15 600 will be clear in view of the discussion of filtering 
performed by the multi-purpose computation cell 700 which 
may be configured, for FIR filtering purposes, to operate 
in generally the same manner as computation cell 600. 

20 Fig. 7 illustrates a multi-purpose computation 

cell 700 which can be configured as part of a computation 
engine 2 00, 3 00 to perform a wide variety of tasks 
including cross correlation, sorting and FIR filtering to 
name but a few. M computation cells 700 may be used to 

25 implement the computation engine 200 or 300. In 

particular embodiments M is equal to 8, 10, 20, 40, 60, 
and 240 although other positive numbers for M are 
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contemplated and possible. In most cases M ie greater 
than 2 . 

In Fig. 7, the computation cell 70 0 comprises 4 
multiplexers (MUXes) labeled MUX1 602 , MUX2 7 04, MUX3 
5 410', MUX4 406'', 3 storage elements labeled STORAGE1 
402, STORAGE 2 706, STORAGE 3 414, 1 multiplier 404, 1 
adder/subtractor 508, and 1 comparator 708 in addition to 
a control circuit 312' 1 1 . The various components of the 
computation cell 700 are coupled together as illustrated 

10 in Fig. 7. The control signals to the MUXes have been 
labeled MIC, M2C, M3CM, and M4CM for MUX1, MUX2 , MUX3 , 
and MUX4 respectively. In addition, the control signal 
for the adder/subtractor has been labeled ASC . The reset 
signals for the STORAGE 1 , STORAGE 2 and STORAGE3 storage 

15 elements have been labeled SIR, S2R, S3R, respectively. 

In some embodiments, STORAGE 1 402 and STORAGE2 
706 are of such a size that they can store the same 
number of bits of binary data while STORAGE3 414 is of 
such a size that it can store approximately twice the 

20 number of bits that STORAGE 1 402 can store. The larger 

size of STORAGE 3 414 is to accommodate the storage of the 
result of a multiplication and addition operation. The 
contents and output of STORAGE 1 402, STORAGE 2 706 and 
ST0RAGE3 414 will be reset to 0 when their respective 

25 reset signals SIR, S2R, or S3R are set to logic 1. 

Adder/subtractor 508 is controlled by the ASC 
signal which, as will be discussed below, is derived from 
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the global control value output by the global control 
register. In some embodiments, the ASC signal 
corresponds to a selected bit of the global control value 
which may be a multi-bit value, e.g., a 12 bit value. 

5 When ASC is set to a value of logic 0, the 

adder/ subtractor performs addition (A + B) of its 2 
inputs. When ASC is set to a value of logic 1, the 
adder/ subtractor performs subtraction (A - B) of its 2 
inputs . 

10 The comparator 708 performs an arithmetic 

comparison of its 2 inputs and generates a single bit 
logic signal labeled CC. The output CC is logic 1 when 
the CA input is larger than or equal to the CB input (CA 
> CB) . The output CC is logic 0 when the CA input is less 

15 than the CB input (CA < CB) . 

The 4 MUXes. 602, 704, 406 '■, 410' in the 
computation cell are 3-input, 1-output MUXes. Thus, for 
each MUX, one of the MUX's 3 inputs will be coupled to 
its output at any time. Each MUX 602, 704, 406' 1 , 410' 
2 0 are responsive to a 2 -bit control signal (labeled MC) to 
determine which one of the inputs is coupled to the 
output at a particular point in time. The truth table 
below describes how the control signal supplied to a mux 
causes the mux to direct one of its inputs to its output. 



25 
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MC 


Mux Output 


00 


10 


01 


11 


10 


12 


11 


Don't care 



The global control value which is stored in the 
global control register 308 is used to configure, e.g., 
control the processing of, the computation engine 300 so 
5 it can perform different functions and computations as 
required for a particular application. Thus, the 
computation cells of a computation engine can be 
reconfigured to perform different functions and 
computations by simply loading a new control value into 
10 the global control register 208 which supplies the global 
control value to each of the individual computation 
cells. 

For a computation engine 3 00 of the type 
illustrated in Figure 3 implemented using M computation 

15 cells of the type illustrated in Fig, 7, a 12-bit global 
control value and global control register 3 08 can be 
used. In accordance with one exemplary embodiment of the 
present invention, the 12-bit value is divided into 
several bit fields with each bit field performing a 

20 different control function, e.g., by controlling a 
different circuit in each computation cell. The 
following table describes an exemplary bit field mapping 
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of the global control value and thus global control 
register contents. 
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Bit Number 


11 


10 


9 


8 


7:6 


5:4 


3:2 


1:0 


Field Name 


SIR 


S2R 


S3R 


ASC 


MIC 


1 M2C 


M3C 


M4C 



10 



Bit fields SIR, S2R, S3R correspond to the like 
named signals which are used to control whether the 
storage elements 1, 2, and 3 in the computation cells are 
reset to 0 . The corresponding register bits can be 
directly connected to the storage element reset signal 
inputs in each of the computation cells or routed through 
a control logic circuit 312' 11 which is then responsible 
for coupling the register bit values to the storage 
element reset inputs. When SIR contains a 1, STORAGE 1 is 
reset to 0. When S2R contains a 1, STORAGE 2 is reset to 
0. When S3R contains a 1, STORAGE 3 is reset to 0. 



Global control register bit field ASC is used 
to control whether the adder/subtractor performs 

15 additions or subtractions. The bits of the ASC register 
field can be directly connected to the ASC control input 
of the 508 included in each computation cell or through 
the control logic circuit 312' ' ' . When ASC has a logic 
value of 0, additions are performed by the controlled 

20 ASCs. When ASC has a logic value of 1, subtractions are 
performed by the controlled ASCs. 

Global control register bit fields MIC and M2C 
are used to control the muxes Ml and M2 of each 
computation cell. They can be directly connected to the 
25 mux control signal inputs MIC and M2C of MUX1 and MUX2 , 
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respectively, or coupled thereto via control logic 
312 ' 1 ' . 

Global control register bit fields M3C and M4C 
are used to control the muxes MUX3 410' and MUX4 406 ' ' , 
5 respectively. The control of MUX3 and MUX4 also depends 
on the value of the cascade control output (CCO) signal 
generated by the computation cell in which the controlled 
MUX is located. The control is also a function of the 
value of the cascade control signal input to the 
10 computation cell in which the controlled MUX is located. 

Control logic 312''' is responsible for 
generating the control signals M3CM and M4CM which are 
used to control muxes MUX3 410' and MUX4 406' 1 . The 
following table illustrates the value of signals M3CM and 
15 M4CM, based on the indicated input values. 



M3C (or M4C) 


M3CM (or M4CM) 


00 


00 


01 


01 


10 


02 


11 


Depends on Cascade Control Output (CCO) and 
Cascaded Control Input (CCI) 



Thus, the present invention provides a way to 
locally control MUX 3 410' and MUX4 406' 1 of each 
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coinputation cell based on the cascade control output and 
cascade control input associated with the computation 
cell being controlled. 

The portion of the control circuit 312 ' 1 ' used 
5 to control MUX3 410" in each computation cell 700 can be 
described by the truth table below. The truth table 
describes how the M3CM control signal can be based on the 
M3C field of the global control value and the locally 
generated cascade control output (CCO) and the cascaded 
10 control input (CCI) obtained, e.g., from the previous 
computation cell 700 in the sequence of M computation 
cells . 
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M3C 


ceo 


CCI 


M3CM 


00 


X 


X 


00 


01 


X 


X 


01 


10 


X 


X 


10 


11 


0 


0 


01 


11 


0 


1 


10 


11 


• 1 


0~ 


00 


11 


1 


1 


00 



The 'X' marks in the above truth table denotes 
"don't cares" in digital logic where the 'X' can be 
5 either 0 or 1; the output is not affected. 

Similarly, the portion of the control circuit 
312' 1 ' used to control MUX4 406' 1 in each computation 
cell 700 can be described by the truth table below. 



M4C 


CO 


CCI 


M4CM 


00 


X 


X 


00 | 


01 


X 


X 


01 


10 


X 


X 


10 


11 


0 


0 


00 


11 


0 


1 


10 


11 


1 


0 


00 


11 


1 


1 


00 
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A control circuit 800 that implements the 
functionality of the above 2 truth tables and which can 
be used as the control circuit 312' 11 is illustrated in 
5 Pig. 8. 

As illustrated, the control circuit 800 
includes first through seventh AND gates 802, 804, 808, 
810, 814, 816, 820, and three OR gates 806, 812, 818, 820 
arranged as illustrated in Fig. 8. Negated inputs of AND 
10 gates are illustrated in Fig. 8 using circles at the 
location of the negated AND gate input. 

A global control value input receives the 12 
bit global control value output by global control 
register 308. The bits of the global control value, are 

15 divided into the individual signals to which they 

correspond and either output or supplied to the logic 
elements of the control circuit 800 as indicated through 
the use of labeling. A pointed connector is used to 
indicate a signal that is supplied to one or more 

2 0 correspondingly labeled AND gate inputs. 

Global control value bits [0] and [1] which 
correspond to signals M4C[0] and M4C[1] are supplied to 
AND gates 814, 816 and 820. From these signals the AND 
gate 820 generates the signal M4CM[0] which is the lower 
25 bit of the signal M4CM. 
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And gate 816 receives the cascade control 
signals CCO and CCI in addition the signals M4C[0] and 
M4C[1] . The OR gate 818 ORs the output of the AND gates 
814, 816 to generate the higher bit [1] of the M4CM 
5 signal . 

Global control value bits [2] and [3] which 
correspond to signals M3C[0] and M3C[1] are supplied to 
AND gates 808, 810. And gate 810 is also supplied with 
the cascade control signals CCO and CCI. The OR gate 812 
10 generates the lower bit [0] of the signal M3CM by ORing 
the outputs of AND gate 808 and 810. 

Global control value bits [2] and [3] which 
correspond to signals M3C[0] and M3C [1] are also supplied 
to AND gates 802, 8 04. And gate 804 is also supplied 
15 with the cascade control signals CCO and CCI. The OR 

gate 806 generates the higher bit [1] of the signal M3CM 
by ORing the outputs of AND gate 802 and 804. 

The control signals M2C, MIC, ASC, S3R, S2R, 
SIR are generated by the control circuit 810 by simply 
20 splitting out the corresponding bits of the global 

control value and using the appropriate bits as a control 
signal . 

The control circuit 800 is suitable for use as 
the control logic circuit 312''' used in the computation 
25 cell illustrated in Fig. 7. Control circuits 312." ' , 312" 
and 312 may be implemented by using a control circuit 
which is the same as or similar to the one illustrated in 
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Fig. 8. However, in such embodiments, unused inputs and 
outputs and the control logic used to generate unused 
outputs may be omitted for purposes of implementation 
efficiency and cost savings. 

The multi-purpose computation cell 700 can be 
used to implement a computation engine 3 00 suitable for a 
wide range of applications, e.g., processing functions. 
Various processing operations as well as the configuring 
of the elements within a computation cell 7 00 to perform 
the processing functions will now be described. 

Autocorrelation Functionality 

Autocorrelation, a special case of cross- 
correlation, is an example of one function which can be 
performed using a computation engine 300 which includes 
15 computation cells 700. 

An autocorrelation sequence for a finite 
sequence of numbers can be described with the following 
equation: 

N-i 

20 Where x[n] is a finite input sequence of N numbers and 
yjo<[n] is the autocorrelation sequence of x[n]. To 
compute the autocorrelation sequence, N 2 / 2 
multiplications and (N 2 - N) / 2 additions are required. 



5 



10 
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As discussed above, in typical microprocessors 
and DSPs with two or fewer MAC units, a software program 
with an iterative loop construct is required to compute 
this sequence. In the typical microprocessors or DSPs 
5 which have only 1 or 2 multiply or MAC units, the 

computation of N autocorrelation sequence numbers will 
normally take approximately N 2 or more computation cycles 
due to the hardware limitations. 

With the computation engine 200 or 300 of the 
10 present invention, each computation cell 70 0 can be 
configured in the following fashion to compute the 
autocorrelation sequence : 



1) STORAGEl, STORAGE2 , and STORAGE3 are initialized to 
15 contain 0. 

This step can be performed by writing the binary number 
"111000000000" into the global control register 208 or 
308 . 

20 

2) MUX1 selects DATA1 input to supply Operandi 

3) MUX 2 selects BROAD CAST2 input to supply 0perand2 

4) MUX3 selects DATA3 as one of the inputs to the 
adder/subtractor 508 

25 5) MUX 4 selects the output of the multiplier 404 as the 

other input to the adder/subtractor 508. 
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These steps can be performed by writing the binary number 
"000000100101" into the global control register 208 or 
308 . 

For the entire computation engine 300, the input signals 
5 are configured in the following fashion: 

6) The sequence of x[0], x[l], x[2],..., x [N - 1] is fed 
to the DATA1 input, 1 per computation cycle. 

7) The sequence of x[0] , x[l], x[2] x[N - 1] , 1 per 
10 computation cycle, is also fed to DATA2 which is 

coupled to the BROADCAST2 input of each of the 
computation cells 700. 

After 1 computation cycle, the first 
15 computation cell 302 would have computed 

x[0]x[0] . 

After 2 computation cycles, 
the first computation cell 3 02 would have 
20 computed x[0]x[0] + x[l]x[l], 

the second computation cell 3 04 would have 
computed x[0]x[l] . 

After N computation cycles, the first 
25 computation cell 302 would have computed: 

x[0]x[0] + x[l]x[l] + ... + X [N - l]x[N - 1] = 
YxxlO] 
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The second computation cell 304 would have 
computed: 

x[0]x[l] + x[l]x[2] + ... + x[N - 2]x[N - 1] = 
yxxtl] 

5 

the Nth computation Cell (3 06 assuming N=M) 
would have computed: 
x[0]x[N - 1] = Yxx [N - 1] 

At this point, the computation engine 30 0 can 
10 be reconfigured (by writing "000000000100" into the 
global control register) so that in each of the 
computation cells 700: 

8) MUX3 selects Input3 as one of the inputs to the 
15 adder 408 . 

9) MUX4 selects Constant (0) as the other input to the 
adder 4 08 . 

The output of the computation engine 3 00 can be 
used to shift out the autocorrelation sequence [N - 1] , 
20 yxxtN - 2], yxxtl] , yxx[0]. The number of computation 
cycles it takes to compute this autocorrelation sequence 
is N. An additional N cycles may be used to read out the 
result from the computation engine 300. 

Cross-Correlation Functionality 

25 The computation engine 300, implemented using 

computation cells 700, can also -be used to perform cross- 
correlation operations. 
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15 



A cross-correlation sequence for a finite 
sequence of real numbers can be described with the 
following equation: 



N-i 



5 where x. x [n] and x 2 [n] are finite input sequence of N 
numbers and y^s [n] is the cross-correlation sequence 
between x a [n] and x 2 [n] . Like autocorrelation, it 
normally takes N 2 / 2 multiplications and (N 2 - N) / 2 
additions to compute a cross -correlation sequence. In 
10 essence, an autocorrelation sequence is just a special 
case of a cross -correlation sequence. 

With the computation engine .300, each 
computation cell 700 can be configured in the following 
fashion to compute the cross-correlation sequence: 



1) STORAGE 1 402, STORAGE 2 706, and STORAGE 3 414 are 
initialized to contain the value 0. 



This step can be performed by writing the 
2 0 binary number "111000000000" into the global control 
register 3 08. 

2) MUX1 602' selects the DATA1 input to supply Operandi 

3) MUX2 704 selects the BROAD CAS T2 input to supply 
2 5 Operand2 
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4) MUX3 410' selects the DATA3 input as the source of 
one of the inputs to the adder/subtractor 508 

5) MUX4 406' ' selects the output of the multiplier as 
the other input to the adder/subtractor 508. 

5 

These steps can be performed by writing the 
binary number "000000100101" into the global control 
register 308. 

For the entire computation engine 3 00, at this 
10 point the input signals would be configured in the 
following fashion: 

6) The sequence of x x [0l , Xi[l], xi[2],.., x x [N - 1] is 
supplied to the computation engine DATA1 input, 1 

15 per computation cycle. 

7) The sequence of x 2 [0] , x 2 [1] , x 2 [2],_., x 2 [N - 1] is 
supplied, 1 per computation cycle, to the 
computation engine's DATA2 input which is coupled to 
the DATA2 input of the first computation cell and to 

2 0 BROADCAST2 input of each one of the M computation 

cells . 

After 1 computation cycle, 

The first computation Cell 3 02 would have 
25 computed xi[0]x 2 [0] . 

After 2 computation cycles, 

The first computation cell 302 would have 

computed x x [0]x 2 [0] + Xi [1] x 2 [1] , 
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' The second computation cell 3 04 would have 
computed Xi[0]x 2 [l] . 
After N computation cycles, 

The first computation cell 302 would have 
5 computed xi[0]x 2 [0] + x 1 [l]x 2 [l] + ... + x a [N - 

l]x 2 [N - 1] = y x i x2 [0] 

The second computation cell 3 04 would have 
computed Xi[0]x 2 [l] + Xitl]x 2 [2] + ... + Xi [N - 
2]x 2 [N - 1] = yxixztl] 
10 The Nth computation cell N (306 assuming N=M) 

would have computed x a [0]x 2 [N - 1] = y x i x2 [N - 1] 

At this point, the computation engine 3 00 can 
be reconfigured, e.g., by writing '"000000000100" into the 
global control register 3 08, so that in each of the 
15 computation cells: 

8) MUX3 410 selects the DATA3 input to supply one of 
the inputs to the adder/ subtractor 508. 

9) MUX 4 406' 1 selects Constant (0) as the other input 
20 to the adder/ subtractor 508. 

The output of the computation engine 300 can be 
used to shift out the cross-correlation sequence y x ix2 [N - 
1]- Yxix2fN - 2], yxi x2 [l], y x ix2[0]. The number of 
computation cycles it takes to compute this cross - 
25 correlation sequence is N. It takes an additional N 

cycles to read out the result from the computation engine 
300 assuming the engine 3 00 has N computation cells or 
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the output is taken from the Nth computation cell 700 in 
the sequence of M computation cells. 

Scalability Of Cross-Correlation Functionality 

The computation engine 300 of the present 
5 invention is scalable. A computation engine 2 00 or 3 00 
with N computation cells can be used to compute 
correlation sequences shorter or longer than N. 

To compute a cross -correlation of two 
sequences, each sequence including I elements, e.g. , 
numbers, where I<N, the computation engine is loaded with 
the sequences of I numbers, the cross -correlation 
sequence is computed, and then the computation results 
stored in the N computation cells are shifted out of the 
computation engine. N-I of the values shifted out of the 
computation engine are not used, e.g., they are 
discarded, while the remaining I values representing the 
cross -correlation result are used. In one particular 
embodiment, the first N-I values read out of the 
computation engine are discarded while the remaining I 
values are supplied to the processor 102 as the 
correlation result. 



Consider for example the case where a cross- 
correlation result is to be generated from two input 
25 sequences which are longer than N, e.g., each sequence 
having 2N elements. With the computation engine 200, 
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300, each computation cell 700 can be configured in the 
following fashion to compute the cross-correlation 
sequence of 2N numbers : 

5 1) STORAGE 1 402, STORAGE2 706, and ST0RAGE3 414 are 

initialized to contain 0. 

2) MUX1 602' selects the DATA1 input to supply Operandi 

3) MUX2 704 selects the BROAD CAS T2 input to supply 
Operand2 

10 4) MUX3 410' selects the DATA3 input to supply one of 

the inputs to the adder/subtractor 508 

5) MUX4 406' 1 selects the output of the multiplier 404 
as the other input to the adder/subtractor 508. 

For the entire computation engine 3 00, the 
15 input signals are configured in the following fashion: 

6) The first sequence of Xx[0], x x [l], Xi[2],... ( Xi[2N - 
1] is fed to the computation engine's DATA1 input, 1 
per computation cycle. 

20 7) The second sequence of x 2 [0], x 2 [1] , x 2 [2],..., x 2 [2N - 

1] is fed, 1 per computation cycle, to the 
computation engine's DATA2 input is thus supplied to 
the DATA2 input of the first computation cell 302 in 
the sequence of computation cells 302, 306, 306. 

25 

After 2N computation cycles: 
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the first computation cell 3 02 would have 

computed: 

xi[0]x 2 [0] + xi[l]x 2 [l] + ... + 
Xi[2N - l]x 2 [2N - 1] = yxix2t0] 

5 

the second computation cell 3 04 would have 

computed : 

xi[0]x[l] + x 1 [l]x 2 [2] + ~. + 
Xi[2N - 2]x 2 [2N - 1] = yxi x2 [l] 

10 

the Nth computation cell 306 would have 

computed: 

XafOlxzfN - 1] + X![l]x 2 [N] + ... + X![N]x 2 [2N - 1] 
= 7x1x2 [N - 1] 

At this point, the computation engine 3 00 can 
be reconfigured so that in each of the computation cells 
302, 304, 306: 

8) MUX3 410' selects the DATA3 input to supply one of 
the inputs to the adder/subtractor 508. 

9) MUX 4 406'' selects the logic value 0 as the other 
input to the adder/subtractor 508. 

The output of the computation engine 300 can be 
used to shift out the cross -correlation sequence yxi x2 [N - 
1], yxixafN - 2], y x i x2 [l], y x ix2[0]. This is half of the 
25 cross-correlation sequence for the 2N input. To complete 
the 2 nd half of the cross-correlation sequence, the 
computation cells are reconfigured as follows: 
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10) The contents of STORAGE 1 402, STORAGE 2 706, and 
STORAGE 3 414 are cleared so that they contain the 
value 0 . 

11) MUX1 602', MUX2 704, MUX3 410', and MUX4 406'' 
5 are configured as in steps 1 to 4 . 

For the entire computation engine 3 00, the 
input signals are then configured in the following 
fashion : 



10 12) The first sequence of Xi[0], x x [1] ,- Xi [2 ] , xi [N - 

1] is fed to the DATA1 input of the computation 
engine, 1 per computation cycle. 
13) The second sequence of x 2 [N] , x2 [N + 1], x 3 [N + 
2],..., x 2 [2N - 1] is also fed, 1 per computation 
15 cycle, to 

the computation engine 1 s DATA2 signal input which 
is coupled to the DATA2 input of the first 
computation cell 302 and to the BROAD CAST2 signal 
input of each one of the M computation cells 302, 
20 304, 306. 

After N computation cycles. 



The first computation cell 3 02 would have 

25 computed: 

Xi['0]x 2 [N] + x x [l]x 2 [N + 1] + ... + 
x x [N - l]x 2 [2N - 1] = y xlx2 [N] 
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The second computation cell 2 would have 

computed: 

x x [0]x 2 [N + 1] + X![l]x 2 [N + 2] + ... + x x [N - 
2]x 2 [2N - 1] = y x ix2[N + 1] 

5 

The Nth computation cell (305 assuming N=M) 
would have computed: 

Xi[0]x 2 [2N - 1] + Xi.[l]x 2 [N] + ... + 
Xi[N]x 2 [2N - 1] = y xlx2 [2N - 1] 

The output of the computation engine 30 0 can be 
used to shift out the cross-correlation sequence y xlx2 [2N 
- 1], yxix2[2N - 2], y xlx2 [N + 1], y xlx2 [N] . ThiB is the 
2 nd half of the cross-correlation sequence for the 2N 
input. The total number of computation cycles it takes 
to compute this cross-correlation sequence is 3N assuming 
the computation engine includes N computation cells 
(N=M) . It takes an additional 2N cycles to . read out the 
result from the computation engine 300. 

In general, this computation method can be 
20 extended to compute the correlation sequence of YxN 

numbers. The computations are divided into Y iterations. 
N correlation sequence numbers are computed in each 
iteration. The 1 st iteration uses' YxN computation cycles, 
the 2 nd iteration uses (Y - 1) xN cycles, the 3 rd iteration 
2 5 uses (Y - 2) xN cycles and the final Y fct iteration uses N 
cycles, assuming use of a computation engine with N 
computation cells. Therefore, using an N cell computation 
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engine 3 00, a correlation sequence of Y*N numbers can be 
computed in the following number of computation cycles: 




5 An additional YXN cycles are used to read out 

the result from the systolic computation engine. 

Sorting Functionality 

The computation engine 300 can also be used to 
10 sort a list of numbers. There are various published 

sorting algorithms available with the "fast" ones having 
an execution order 0(Nlog 2 N) , which means that the 
sorting algorithm's computation cycle is proportional to 
Nlog 2 N, where N is the number of entries to be sorted. A 
15 slow algorithm might have an execution order 0(N 2 ) . 

The determining factor for a sorting algorithm 
usually has to do with the number of comparisons the 
algorithm must make between the entries in order to 
perform sorting. 

20 With the computation engine of the present 

invention, N comparisons can be made simultaneously per 
computation cycle assuming the computation engine 300 
includes N computation cells (N=M) . Each computation cell 
302, 3 04, 3 06 can compare its content with the current 

25 entry in the list of numbers being sorted to determine 
the proper location in the final, sorted, list. 
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To perform such a sorting algorithm, the 
computation engine 300 can be configured in the following 
fashion: 



5 1) MUX1 602' selects the BROADCAST 1 signal input to 

supply Operandi 

2) MUX2 704 selects the Broadcast2 signal input to 
supply Operand2 

3) STORAGE 3 414 stores both the entries and its 

10 associated index in the unsorted list . This can be 

accomplished because STORAGE3 414 has approximately 
twice the bit -width as required to store any entry 
in the unsorted list. STORAGE3 414 can be split to 
store the index of the entry on the top half (most 

15 significant bits) and the entry itself on the bottom 

half (least significant bits) of the bits. 

4) MUX3 410' is controlled by the cascade control input 
signal (set to 0 in the case of the first 
computation cell 3 02 and received from the previous 

2 0 computation cell for each of the other computation 

cells) and the cascade control output of the current 
computation cell obtained from comparator 708. 

• If the comparator result indicates that 0perand2 
is greater than the number portion of the DATA3 

25 input signal, then MUX3 410' selects the DATA3 

input signal as one input to the adder. 

• If the comparator result indicates that 0perand2 
is less than the number portion of the DATA3 
input signal AND the cascade control signal from 
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the previous computation cell also indicates so, 
then MUX3 410 1 selects the DATA3 input signal as 
one input to the adder. 

• If the comparator result indicates that Operand'2 
5 is less than the number portion of the DATA3 

input AND the cascade control input signal from 
the previous computation cell indicates that 
Operand2 was greater than the number portion of 
the DATA3 input signal in the previous 
10 computation cell, then MUK3 410' selects Operand2 

(prepended with 0 on the index portion) as one 
input to the adder. 
5) MUX4 406' ' is controlled by the cascaded control 
input signal received from the previous computation 
15 cell and the comparator result, e.g., the cascade 

control output signal generated by the current 
computation cell: 

• If the comparator result indicates that Operand2 
is greater than the number portion of the DATA3 

20 input signal, then MUX4 406' 1 selects Constant 0 

as the other input to the adder 508. 

• If the comparator result indicates that 0perand2 
is less than the number portion of DATA3 input 
signal AND the cascaded control input signal 

2 5 received from the previous computation cell also 

indicates so, then MUX4 406'" selects Constant 0 
as the other input to the adder 508. 

• If the comparator result indicates that Operand2 
is less than the number portion of the DATA3 
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input signal AND the cascaded control input 
signal received from the previous ' computation 
cell indicates that 0perand2 was greater than the 
number portion of DATA3 input signal in the 
5 previous computation cell, then MUX4 406' ' 

selects Operandi (appended with 0 on the entry- 
portion) as the other input to the adder. 

The combination of what MUX3 410' and MUX4 
406 ' 1 select as the input to the adder has the following 
10 effect: 

• If the comparator result indicates that Operand2 is 
greater than the number portion of the DATA3 input 
signal, then the DATA3 input signal is stored back 
into STORAGE 3 414. 

15 • If the comparator result indicates that Operand-2 is 

less than the number portion of DATA3 input signal 
AND the cascade control signal received from the 
previous computation cell also indicates so, then 
the DATA3 input signal is stored in STORAGE3 414. 

20 • If the comparator result indicates that Operand2 is 

less than the number portion of the DATA3 input 
signal AND the cascade control signal received from 
the previous computation cell indicates that 
Operand2 was greater than the number portion of the 

25 DATA3 input signal in the previous computation cell, 

then Operand2 and its associated index is stored 
into STORAGE 3 414. 



WO 02/12978 



PCT/US01/24667 



-54- 

The above steps can be performed by simply 
writing "000010101111" into the global control register 
l 308. 

For the entire computation engine 3 00, the 
5 input signals are configured in the following fashion: 

6) The sequence of 0, 1, 2,..., N - 1 as the index to the 
unsorted list is fed, one computation cycle at a 
time, to the DATA1 signal input thereby resulting in 
the signal being supplied to the BROADCASTS input of 
each computation cell in the computation engine 300. 

7) The sequence of x[0], x[l] , x[2],..., x[N - 1] as the 
entry to the unsorted list is fed, one computation 
cycle at a time, to the DATA2 input of the 
computation engine 3 00 thereby resulting in the 
signal being supplied to the BROAD CAS T2 input of 
each of the computation cells in the computation 
engine 300. 

The configuration of the computation engine 3 00 
20 effectively implements an insertion sort algorithm. 
After N computation cycles, the systolic computation 
engine can be reconfigured so that in each computation 
cell: 



10 



15 



25 8) MUX3 410' selects the DATA3 input signal as one of 

input to the adder 508. 
9) MUX4 406'' selects Constant (0) as the other input 
to the adder 508 . 
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The output of the computation engine 300 can be 
used to shift out the sorted sequence of numbers and 
their associated index in the unsorted sequence, from the 
largest to the smallest. The number of computation 
5 cycles used to complete the sorting is N. An additional 
N cycles are used to read out the result from the 
computation engine 300. 



FIR Filtering Functionality 

10 With the computation engine 300, the engine's 

computation cells can be configured in the following 
fashion to compute an FIR (finite impulse response) 
filter output sequence: 

15 1) STORAGE 1 402 is initialized to contain the filter 
impulse response or the filter coefficients in 
reverse, i.e., the first computation cell 302 will 
have h[N - 1] in STORAGE 1 402, the second 
computation cell 304 will have h [N - 2] in its 

20 STORAGE 1 402, and so on. Computation Cell N will 

have h[0] in its ST0RAGE1 402. This will generally 
take N computation cycles to complete the 
configuration, e.g., loading of filter coefficients 
in to the STORAGE 1 elements of individual 

25 computation cells. 
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2) STORAGE 3 is initialized to contain 0 for each of the 
computation cells 3 02, 304, 3 06 in the computation 
engine 3 00. 

3) MUX1 602 selects the DATA1 input signal to supply 
5 Operandi . 

4) MUX2 7 04 selects BROAD CAS T2 input to supply Operand2 

5) MUX3 410' selects the DATA3 input to provide one of 
the inputs to the adder/ subtract or 508. 

6) MUX4 406' 1 selects the output of the multiplier 404 
10 as the other input to the adder/subtractor 508. 

The computation engine 3 00 can be configured to 
perform step 1 by writing "000001000000" into the global 
control register 308. Step 2 can be accomplished by 
writing "001000000000" into the global control register 
15 308. Steps 3 to 6 can be accomplished by writing 

"000000100101" into the global control register 308. 

7) The sequence of x[0] , x[l], x[2],..., x[N - 1], and so 
on, is fed 1 per computation cycle, to the DATA2 

20 input of the computation engine which is coupled to 

the DATA2 input of the first computation cell 3 02 
and to the BROAD CAST2 input of each of the 
computation engine's computation cells 3 02, 3 04, 
306. 

25 8) The constant 0 is fed to DATA3 input of the 

computation engine 300. 
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The output of the computation engine 300 can be 
used to read the filter output sequence y[0] , y[l] , 
y[N - 2], y[N - 1], and so on. 

The computation engine of the present invention 
5 cal also be used to implement the convolution of 2 

sequences since a convolution can be expressed by the 
same equation as that used to represent the supported FIR 
filter discussed above. 
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Parallel Multiply And Accumulate Functionality 

The computation engine 300 implemented using 
computation cells 700 can also be configured to be a 
5 parallel MAC unit capable of performing N multiply-and- 
accumulate operations at once (assuming N=M) by writing 
"000000000001" into the global control register 308. In 
such an application, N computation cycles are used to 
shift in the operands, e.g., by writing "110000000000" 

10 into the global control register, and N computation 

cycles are used to shift out the result. The shifting 
out of the result may be achieved by writing 
"001000000000" into the global control register 308. 
Thus, the computation engine 300 of the present invention 

15 can be used to provide high speed MAC unit functionality 
to a microcontroller, DSP or other digital circuit. 

Additional Functionality 

20 The following table summarizes various 

functions, with their associated global control register 
encoding, that can be performed by a computation engine 
300 which is implemented using multipurpose computation 
cells 700. 
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SIR 


S2R 


S3R 


ASC 


MIC 


M2C 


M3C 


M4C 


No Operations (NOP) 


0 


0 


0 


0 


00 


00 


00 


00 


Reset Storagel 


1 


0 


0 


0 


00 


00 


00 


00 


Reset Storage2 


0 


1 


0 


0 


00 


00 


00 


00 


Reset Storage3 


0 


0 


1 


0 


00 


00 


00 


00 


Shift Storagel 


0 


0 


0 


0 


01 


00 


00 


00 


Shift St or age 2 


0 


0 


0 


0 


00 


01 


00 


00 


Shift Storage3 


0 


0 


0 


0 


00 


00 


01 


.00 


Compute Correlations 


0 


0 


0 


0 


01 


10 


00 


01 


Compute FIR 


0 


0 


0 


0 


00 


10 


01 


■ 01 


Sort 


0 


0 


0 


0 


10 


10 


11 


11 


Parallel Multiply and 
Add 


0 


0 


0 


0 


00 


00 


00 


01 


Parallel Multiply and 
Subtract 


0 


0 


0 


1 


00 


00 


00 


01 



Note that some of the functions can be combined 
to be performed together. For example, functions reset 
5 storagel, reset storage2, and reset storage3 can be 

performed together when "111000000000" is written into 
the global control register. Similarly, functions shift 
STORAGE 1 and shift STORAGE 2 can be performed together 
when "00001010000" is written into the global control 
10 register. 

Variations on the above described exemplary 
embodiments will be apparent to those skilled in the art 
in view of the above description of the invention. Such 
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embodiments are considered to be part of the present 
invention. 

For example, the computation engine of the 
5 present invention may, and in one embodiment does, 

include parallel outputs so that the processing result 
generated by' each computation cell can be read our in 
parallel thereby avoiding the need to shift out the 
computation result. In addition, the computation engine 

10 of the present invention can be configured and used to 
perform a wide variety of processing operations in 
addition to those specifically described herein. 
Furthermore, while voice processing applications have 
been described, the computation engine of the present 

15 invention may be used in any number of processing 

applications and is not limited to audio and/or voice 
data processing applications. 
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WHAT IS CLAIMED IS: 



1 1. A digital signal processor, comprising: 

2 a software programmable processing circuit for 

3 performing signal processing operations under software 

4 control; and 

5 a computation engine coupled to said software 

6 programmable processing circuit for performing a plurality 

7 of digital signal processing operations including multiply 

8 and add operations in parallel, the computation engine 

9 including: 

10 a plurality of first through Mth 

11 computation cells, where M is a positive integer 
1.2 greater than 2, each of the M computation cells 
13 including a multiplier and an adder circuit. 

1 2. The digital signal processor of claim 1, wherein the 



2 first through Mth computation cells are coupled together in 

3 series, a data input of the first computation cell being 

4 coupled to said programmable processing circuit for 

5 receiving data to be processed, a data output of the Mth 

6 computation cell being coupled to said first software 

7 programmable processing circuit for supplying data thereto. 

1 3. The digital signal processor of claim 2, wherein each 

2 computation cell further comprises: 
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3 a control value input for receiving a control 

4 value used to control the configuration of circuitry 

5 included in the computation cell . 

1 4. The digital signal processor of claim 3, wherein each 

2 computation cell further comprises: 

3 a comparator ,- 

4 at least two storage elements; and 

5 means for configuring the connections between the 

6 multiplier, adder, comparator and storage elements as a 

7 function of the control value supplied to the computation 

8 cell. 

1 5. The digital signal processor of claim 4, wherein said 

2 means for configuring includes : 

3 a plurality of multiplexers; and 

4 control logic circuitry for generating 

5 multiplexer control signals from said control value, 

6 different bits of said control value being used to generate 

7 different multiplexer control signals. 

1 6. The digital signal processor of claim 4, wherein the 

2 software programmable processing circuit and the 

3 computation engine are implemented as a single chip. 

1 7. The digital signal processor of claim 1, wherein the 

2 software programmable processing circuit and the 

3 computation engine are implemented on the same piece of 

4 semi-conductor material. 
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1 



8. 



The digital signal processor of claim 1, further 



2 comprising: 



3 



an additional software programmable processing 



4 circuit for performing signal processing operations under 

5 software control coupled to said computation engine. 

1 9. The digital signal processor of claim 8, further 

2 comprising: 



4 supply of data from said software programmable processing 

5 circuit and said additional software programmable 

6 processing circuit to the computation engine. 

1 10. The digital signal processor of claim 9, wherein the 

2 input selection control circuit is responsive to a control 

3 signal from said software programmable processing circuit 

4 to supply data from a selected one of said software 

5 programmable processing circuit and said additional 

6 software programmable processing circuit to said 

7 computation engine at any given time. 

1 11. A digital signal processor, comprising: 

2 first and second programmable processing 

3 circuits; and 

4 a computation engine coupled to said programmable 

5 processing circuits, the computation engine including a 

6 plurality of computation cells arranged to perform signal 

7 processing operations in parallel. 



3 



an input selection circuit for controlling the 
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1 12. The digital signal processing circuit of claim 11, 

2 wherein said plurality of computation cells include more 

3 than two computation cells. 

1 13. The digital signal processing circuit of claim 11, 

2 further comprising: 

3 means for time sharing the computation engine 

4 between said first and second programmable processing units 

5 on a time shared basis. 

1 14. The digital signal processing circuit of claim 13, 

2 wherein the computation cells are configurable, further 

3 comprising : 

4 means for controlling the configuration of the 

5 computation cells to perform different processing 

6 operations at different times. 

1 15. The digital signal processing circuit of claim ,13, 

2 wherein the computation cells are configurable, further 

3 comprising: 

4 means for controlling the configuration of the 

5 computation cells to perform one of a correlation operation 

6 and a sorting operation. 

1 16. The digital signal processing circuit of claim 15, 

2 wherein the means for controlling the configuration 

3 includes a control register for storing a control value 

4 used to control the configuration of each computation cell 

5 in said plurality of computation cells. 
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1 17. The digital signal processing circuit of claim 16, 

2 wherein the plurality of computation cells includes at 

3 least 2 0 computation cells. 

1 18. The digital signal processing circuit of claim 15, 

2 wherein the plurality of computation cells includes at 

3 least 240 computation cells. 

1 19. The digital signal processor of claim 11, wherein the 

2 plurality of computation cells includes first through Mth 

3 computation cells, the first through Mth computation cells 

4 being coupled together in series, a data input of a first 

5 computation cell in the series of M computation cells being 

6 coupled to said first and second programmable processing 

7 circuits for receiving data to be processed, a data output 

8 of the Mth computation cell being coupled to said first and 
• 9 second software programmable processing circuits for 

10 supplying data thereto. 

1 20. The digital signal processor of claim 19, wherein a 

2 controllable switch is used to couple the first and second 

3 programmable processing circuits to the data input of the 

4 first computation cell. 

1 21. The digital signal processing circuit of claim 11, 

2 wherein each of the computation cells includes at least a 

3 multiplier and one adder. 



1 

2 



22. The digital signal processing circuit of claim 21, 
wherein each computation cell further includes at least one 
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storage device for storing the result pf a processing 
operation performed by said computation cell. 

23. A computation engine, comprising: 

a plurality of first through Mth computation cells for 
performing processing operations in parallel, where M is an 
integer greater than two, the first through Mth computation 
cells being coupled together in series, the first 
computation cell in the series of M computation cells 
including a data input of a first computation cell for 
receiving data to be processed by the series of M 
computation cells, the Mth computation cell including a 
data output for outputting data processed by the series of 
M computation cells, 

each computation cell comprising: 

a subtractor and a multiplier. 

24. The computation engine of claim 23, wherein each 
computation cell further comprises : 

a storage device; and 

means for configuring connections between the 
subtractor, multiplier and storage device. 

25. The .computation engine of claim 24, wherein each 
computation cell further include a comparator; and 

wherein the means for configuring the connections 
between the subtractor, multiplier and storage device is 
responsive to a control signal to configure the computation 
cell to perform part of a sorting operation. 
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1 26. The method of claim 24, wherein the subtractor in each 

2 of the computation cells is part of a configurable 

3 adder/ subtractor circuit. 

1 27. A method of using a configurable computation engine 

2 including a plurality of M computation cells, where M is an 

3 integer greater than two, to perform a digital signal 

4 processing operation, the method comprising: 

5 configuring the computation cells within the 

6 computation engine to perform correlation processing 

7 operations; 

8 operating the computation engine to perform a 

9 correlation operation; 

10 reconfiguring the computation cells within the 

11 computation engine to perform filtering processing 

12 operations; and 

13 operating the computation engine to perform a 

14 filtering operation. 

1 28. The method of claim 27, .further comprising the steps 

2 of: 

3 reconfiguring the computation cells within the 

4 computation engine to perform sorting processing 

5 operations; and 

6 operating the computation engine to perform a 

7 sorting operation. 

1 29. The method of claim 26, further comprising the step 

2 of: 

3 supplying digital audio data corresponding to a 

4 first voice channel to said computation engine prior to 



WO 02/12978 PCT/US01/24667 

-68- 



5 performing said correlation operation, said correlation 

6 operation being performed on said supplied digital audio 

7 data. 

1 30. The method of claim 29, wherein said correlation 

2 operation is a cross -correlation operation. 

1 31. The method of claim 29, wherein said correlation 

2 operation is a auto-correlation operation. 

1 32. The method of claim 29, further comprising the step 

2 of : 

3 supplying digital audio data corresponding to a 

4 second voice channel to said computation engine prior to 

5 performing said filtering operation, said filtering 

6 operation being performed on said supplied digital audio 

7 data. 

1 33. The method of claim 32, wherein said filtering 

2 operation is a finite impulse response filtering operation. 

1 34. The method of claim 27, further comprising the steps 

2 of: 

3 supplying digital data from a first programmable 

4 processor to the computation engine to be used in 

5 performing the correlation operation; and 

6 supplying digital data from a second programmable 

7 processor, to the computation engine to be used in 

8 performing the filtering operation. 
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35. The method of claim 27, wherein M is at least 20, the 
step of configuring the computation cells within the 
computation engine including the step of supplying a 
configuration control value to each of the M computation 
cells . 

35. The method of claim 35, wherein the step of supplying 
a configuration control value to each of the M computation 
cells includes the step of supplying the same mult i -bit 
control value to- each of the M computation cells. 

37. A method of using a configurable computation engine 
including a plurality of M computation cells, to perform a 
digital signal processing operation, where M is an integer 
greater than one, the method comprising: 

configuring the computation cells within the 
computation engine to perform sorting processing 
operations ; 

operating the computation engine to perform a 
sorting operation; 

reconfiguring the computation cells within the 
computation engine to perform filtering processing 
opera t ions ; and 

operating the computation engine to perform a 
filtering operation. 

38. The method of claim 37, further comprising the steps 
of: 

supplying digital data from a first programmable 
processor to the computation engine to be used in 
performing the sorting operation; and 
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6 supplying digital data from a second programmable 

7 processor to the computation engine to be used in 

8 performing the filtering operation. 

1 39. The method of claim 38, wherein M is at least 8, the 

2 step of configuring the computation cells within the 

3 computation engine including the step of supplying a 

4 configuration control value to. each of the M computation 

5 cells. 

1 40. The method of claim 39, wherein the step of supplying 

2 a configuration control value to each of the M computation 

3 cells includes the step of supplying the same multi-bit 

4 controi value to each of the M computation cells. 

1 41. A method of using a configurable computation engine 

2 including a plurality of M computation cells, to perform a 

3 digital signal processing operation, wherein M is a 

4 positive integer greater than 1, the method comprising: 

5 configuring the computation cells within the 

6 computation engine to perform correlation processing 

7 operations; 

8 operating the computation engine to perform a 

9 correlation operation; 

10 reconfiguring the computation cells within the 

11 computation engine to perform sorting processing 

12 operations; and 

13 operating the computation engine to perform a 

14 sorting operation. 
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1 42. The method of claim 41, further comprising the steps 

2 of: 

3 supplying digital data from a first programmable 

4 processor to the computation engine to be used in 

5 performing the correlation operation; and 

6 supplying digital data from a second programmable 

7 processor to the computation engine to be used in 

8 performing the sorting operation. 

1 43. The method of claim 41, wherein M is at least 8, the 

2 step of configuring the computation cells within the 

3 computation engine including the step of supplying a 

4 configuration control value to each of the M computation 

5 cells.. 

44. The method of claim 42, wherein the step of 
supplying a configuration control value to each of the M 
computation cells includes the step of supplying the same 
mult i -bit control value to each of the M computation 
cells . 
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